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ABSTRACT 

This paper introduces the concept of adaptive temporal com- 
pressive sensing ( CS) for video. We propose a CS algorithm 
to adapt the compression ratio based on the scene's tempo- 
ral complexity, computed from the compressed data, without 
compromising the quality of the reconstructed video. The 
temporal adaptivity is manifested by manipulating the inte- 
gration time of the camera, opening the possibility to real- 
time implementation. The proposed algorithm is a general- 
ized temporal CS approach that can be incorporated with a 
diverse set of existing hardware systems. 

Index Terms — Video compressive sensing, temporal 
compressive sensing ratio design, temporal superresolution, 
adaptive temporal compressive sensing, real-time implemen- 
tation. 

1. INTRODUCTION 

Video compressive sensing (CS), a new application of CS, has 
recently been investigated to capture high-speed videos at low 
frame rate by means of temporal compression (T] |2l [3][^ A 
commonality of these video CS systems is the use of per-pixel 
modulation during one integration time-period, to overcome 
the spatio-temporal resolution trade-off in video capture. As 
a consequence of active UJ [21 and passive pixel-level coding 
strategies | 3 | (see Fig.[T]), it is possible to uniquely modulate 
several temporal frames of a continuous video stream within 
the timescale of a single integration period of the video cam- 
era (using a conventional camera). This permits these novel 
imaging architectures to maintain high resolution in both the 
spatial and the temporal domains. Each captured frame of the 
camera is a coded temporal linear combination of the under- 
lying high-speed video frames. After acquisition, high-speed 
videos are reconstructed by various CS inversion algorithms 

Eiiioiiia. 

These hardware systems were originally designed for 
fixed temporal compression ratios. The correlation in time 
between video frames can vary, depending on the detailed 
time dependence of the scene being imaged. For example, a 
scene monitored by a surveillance camera may have signif- 
icant temporal variability during the day, but at night there 

^ Significant work in spatial compression has been demonstrated with a 
single-pixel camera |4 5 6 7 1. Unfortunately, this hardware cannot decrease 
the sampling frame rate, and therefore has not been applied in temporal CS. 
(8 1 achieved compressive temporal superresolution for time-varying periodic 
scenes by exploiting their Fourier sparsity. 
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Fig. 1. Illustration of the coding mechanisms within the Coded Aperture 
Compressive Temporal Imaging (CACTI) system |3 |. The first row shows 
Np high-speed temporal frames of the source datacube video; the second 
row depicts the mask with which each frame is multiplied (black is zero, 
white is one). In CACTI, the same code (mask) is shifted (from left to right) 
to constitute a series of frame-dependent codes. Finally, the CACTI mea- 
surement of these Np frames is the sum of the coded frames, as shown at 
right-bottom. 

may be extended time windows with no or limited changes. 
Therefore, adapting the temporal compression ratio based on 
the captured scene is important, not only to maintain a high 
quality reconstruction, but also to save power, memory, and 
related resources. 

We introduce the concept of adaptive temporal compres- 
sive sensing to manifest a CS video system that adapts to the 
complexity of the scene under test. Since each of the afore- 
mentioned cameras involves similar integration over a time 
window, in which Np high-speed video frames are modu- 
lated/coded, we propose to adapt this time window (the inte- 
gration time Np), to change the temporal compression ratio 
as a function of the complexity of the data. Specifically, we 
adaptively determine the number of frames Np collapsed to 
one measurement, using motion estimation in the compressed 
domain^ 

The algorithm for adaptive temporal CS can be incorpo- 

^ Studies have shown that improved performance could be achieved when 
projection matrices are designed to adapt to the underlying signal of interest 
[l3^3[l4 15 16 1 . However, none of these methods was developed for video 
temporal CS. The adaptive CS ratio for video has been investigated in I17II18I 
19 20 21]. Each frame in the video to be captured is partitioned into several 
blocks based on the estimated motion, and each block is set with a different 
CS ratio. Though a novel idea, it is difficult to employ it in real cameras since 
it is hard to sample at different framerates for different regions (blocks) of the 
scene with an off-the-shelf camera. In contrast, the method presented in this 
paper can be readily incorporated with various existing hardware systems. 



rated with a diverse range of existing video CS systems (not 
only the imaging architectures in |[T]|2l|3l but also flutter shut- 
ter [22, 23] cameras), to implement real-time temporal adap- 
tation. Furthermore, thanks to the availability of hardware for 
simple motion estimation |24|, the proposed algorithm can be 
readily implemented in these cameras. 

2. PROPOSED METHOD 

The underlying principle of the proposed method is to deter- 
mine the temporal compression ratio Np based on the motion 
of the scene being sensed. In the following, we propose to 
estimate the motion of the objects within the scene, to adapt 
the compression ratio for effective video capture. 
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Fig. 2. Basic principle of block-matching. Search all the P x P blocks in 
the window of frame B to find the one best matched with the block in frame 
A, and use this to compute the block motion. 

2.1. Block-Matching Motion Estimation 

The block-matching method considered here has been em- 
ployed in a variety of video codes ranging from MPEGl / 
H.261 to MPEG4 / H.263 (Ml ESI ES. Diverse algorithms 
i25il have investigated the block-matching concept shown in 
Fig. [2] The key steps of the block-matching method are re- 
viewed as follows: i) partition frame A (e.g., previous frame) 
into P X P (pixels) blocks; ii) pre-define a window size M x 
M (pixels); Hi) search all the P x P blocks in the M x M 
windows in frame B (e.g., current frame) around the selected 
block in frame A; iv) and find the best matching block in the 
window according to some metric such as mean squared error, 
and use this to compute the block motion. We demonstrate 
adaptive compression ratios based on this estimated motion 
from reconstructed video frames in Section |3l 

Estimating motion in high-speed dynamic scenes via the 
block-matching method in the reconstructed video (after sig- 
nal recovery) is computationally infeasible given current re- 
construction times at even modest compression ratios. Hence, 
we aim to compute the adaption of Np based directly on 
the raw {compressed) measurements without the intermedi- 
ate step of reconstruction. The following section proposes 
a method to estimate motion solely on low-framerate, coded 
measurements from the camera. 

2.2. Real-Time Block-Matching Motion Estimation 

Estimating motion from the camera captured data requires 
the motion to be observable without reconstructing the video 
frames from the measurement. Fig. [3] presents the underlying 
principle of the real-time block-matching motion estimation 




Frame A Frame B 

Fig. 3. Real-time motion estimation by block-matching. 

approach. From this figure, it is apparent that the scene's mo- 
tion is observable within the time-integrated coding structure. 
This property lets us employ the block-matching method di- 
rectly on raw measurements (frames A and B in Fig. [3]) to 
estimate the scene's motion. Adapting the compression ratio 
Nf online is feasible due to the computational simplicity of 
this method. 




Fig. 4. Segmentation of foreground and background by motion estimation 
from compressed measurements. Left is the original measurement; middle is 
the background blocks with foreground blocks shown in black and the right 
part presents the foreground blocks with background blocks shown in black. 
Note that the aim of this work is to estimate the motion, not segmentation. 
This primary segmentation helps us to localize the moving parts of the scene. 
16x16 (P = 16) blocksize is used and the window size is defined as 40 x 40 
(M = 40). Cross-diamond search algorithm 1 27 1 has been used to generate 
this figure and the subsequent results in Section[3] 

By thresholding the measurement from the moving pix- 
els estimated for each block, we can also roughly segment 
the scene into the foreground and the background (Fig. |4]). 
Notably, we adapt Np solely based on the estimated motion 
velocity V (pixels/frame) for the fastest-moving blocks of the 
scene. 

Intuitively, the compression ratio required to faithfully re- 
construct the scene's motion is inversely proportional to the 
detected velocity V, e.g., Np = where C is a constant 
that depends on the scene. In practice, we simply apply a 
look-up table to (discretely) appropriately adapt Np with few 
computations. See Table [T] for an example. Since good hard- 
ware exists for motion estimation 1241 , the proposed method 
can be implemented in real time. 

It is worth noting that the estimated motion, and then the 
selected Np based on the present measurement, is used in the 
upcoming frames. We assume the consistent motion of the 
adjacent frames in the video. Sudden changes of the motion 
will result in one integration time delay of the Np adaption. 
Simulation results in Fig. [5] verify this point. We can of course 
put an upper bound in Np . 



(a) PSNR, adaptive Nf and estimated velocity v^. high-speed frames 
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(b) Measurement 
frame 99-106 



Fig. 5. (a) Reconstruction PSNR (dB), adaptive Np (frames), and veloc- 
ities (pixels/frame) estimated from the original video and measurements, all 
are plotted against frame number, (b-d) Measurements with vehicles at dif- 
ferent velocities. 




(a) Ground truth, frames 1 ~4 shown as examples 




(b) Reconstructed video frames, 1 6 selected frames shown as examples 

Fig. 6. Selected reconstructed frames (b) based on the adaptive Np pre- 
sented in Fig.|5] Frames 1 to 4 in (a) are shown as examples of ground truth. 
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Table 1. Relationships between the velocity V (pixels/frame) of the fore- 
ground and the compression ratio Np (frames). 
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Fig. 7. Reconstruction PSNR (dB), adaptive Np (frames), and velocities 
(pixels/frame) estimated from the original and reconstructed video frames, 
all are plotted against frame number. 

3. EXPERIMENTAL RESULTS 

From O, we have found that (based on extensive simulations) 
shifting a fixed mask is as good as using the more sophisti- 
cated time evolving codes used in HI |2l. For convenience 
(but not necessity), the subsequent results will use a shifted 
mask to modulate the high-speed video frames. 

3.1. Example 1: Synthetic Traffic Video 

We illustrate the adaptive compression ratio framework on a 
traffic video 1281 that has 360 frames. We artificially vary 
the foreground velocity for this video to evaluate the pro- 
posed method's performance for motion estimation and Nf 
adaption. Frames 1-120 (Fig.jSfb)) and 241-336 (Fig.jSfd)) 
run at the originally-captured framerate; we freeze the scene 
between frames 121-240 (FigJSl^^))- Generalized alternating 
projection (GAP) algorithm iflTll is used for the reconstruc- 
tions. 

Table [T] provides the compression ratio Nf correspond- 
ing to several scene velocities V. This look-up table (learned 
based on training dat^ seeks to maintain a constant recon- 
struction peak signal-to-noise ratio (PSNR) of 22dB. Fig. |5] 
presents the real-time motion estimation results using simu- 
lated low-framerate coded exposures of the traffic video with 
an initial compression ratio Nf = 6. After a short fluctu- 
ation, the estimated velocity of the scene becomes constant; 
Nf accordingly stabilizes at 8. When the vehicles freeze, 
the block-matching algorithm senses zero change in the pixel 
position and updates Nf to 16. Nf returns to 8 upon con- 
tinuing video playback at normal speed. We can also observe 
the consistence of velocities estimated from the original video 
and from the compressed measurements in Fig.|5ja). Sudden 
changes in the video's framerate (and hence the motion ve- 
locity V) are reflected in short fluctuations of the PSNR (for 



^ We use other traffic videos playing at different velocities (different fram- 
erates) to learn this table. The main steps are presented as follows: i) gen- 
erate videos with different motion velocities by changing the framerate; ii) 
estimate the motion velocities V of the generated videos; in) modulate the 
generated videos with shifting masks and constitute measurements with di- 
verse Np; iv) reconstruct the videos with GAP 1 1 1 1 from these compressed 
measurements and calculate the PSNR of the reconstructed video; v) and 
build the relations between estimated velocities V and Np maintaining a 
constant PSNR (around 22dB). 
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(f) Reconstructed frames 539-544 with nonadaptive (constant) Nf=10, average PSNR = 27 .5445(16 




Fig. 8. Motion estimation and adaptive Np from the measurements, 
(a) Reconstruction PSNR (dB), adaptive Np (frames) (average adaptive 
A^i?=10.12), and velocities (pixels/frame) estimated from the measurements, 
all are plotted against frame number, (b-d) Measurements when there is 
nothing, one person, and a couple moving inside the scene, adapted Np = 
16, 8, 6, respectively, (e) Reconstructed frames 539-544 from the measure- 
ment in (d) with adaptive Np. (f) Reconstructed frames 539-544 with non- 
adpative (constant) Np = 10. 



one time-integration period) in Fig.[5ja). The average PSNR 
of the reconstructed frames in Fig. [5] is 21.8dB, very close to 
our expectation (22dB). Fig. [6] presents several reconstructed 
frames based on the adaptive Np in Fig. [5] 

We additionally evaluate the block-matching algorithm's 
performance by deploying it to reconstructed frames. Fig. |7] 
demonstrates that its performance is similar to the phenom- 
ena shown in Fig. |5] This justifies that it is unnecessary to 
reconstruct each measurement prior to updating Np. 

3.2. Example 2: Realistic Surveillance Video 

Fig. [8] implements adaptive Np on video data captured in 
front of a shop 1291 . Table [T] is again useful for this exam- 
ple. 

The first 189 frames of this video (Fig.[8jb)) are station- 
ary; nothing is moving within the scene. As seen before, since 
y = 0, the compression ratio remains at Nf=\6. After the 
1 89^^ frame, different people begin to walk in and out of the 
video area (Fig.[8jc-d)). The compression ratio Np is adapted 
between 6 and 16 according to the estimated velocity. When 
one person walks into the shop (Fig. [8jc)), the compression 
ratio drops (A/^f=8). This results in a better-posed reconstruc- 
tion of the underlying video frames. When a couple walks in 
front of the shop (Fig.[8jd)), Np drops further to 6. The cor- 
responding measurement and reconstructed frames are shown 
in Fig.[8];d,e). 

This video takes a total of 67 adaptive measurements to 
capture and reconstruct 678 high-speed video frames, achiev- 
ing a mean compression ratio Np ^10.12. To demonstrate 
the utility of adapting Np based on the sensed data, we 
compare adaptive reconstructions to those obtained when 
Np is fixed at or near its expected value. Fig. [8jf) shows 
reconstructed frames 539-544 when fixing Np = 10. Com- 
paring part (e) with part (f), we notice that adapting Np 
provides a (3dB) higher reconstruction quality (average 
PSNR=30.65dB) than fixing Np near its expected value 
(average PSNR=27.54dB). These improvements are most 
noticeable whenever there is motion within the scene and 
demonstrate the potency of temporal compression ratio adap- 
tation in realistic applications. 

4. CONCLUSION 

We have introduced the concept of adaptive temporal com- 
pressive sensing for video and demonstrated a real-time 
method to adapt the temporal compression ratio for video 
compressive sensing. By estimating the motion of objects 
within the scene, we determine how many measurements are 
necessary to ensure a reasonably well-conditioned estimation 
of high-speed motion from lower-framerate measurements. 

A block-matching algorithm estimates the scene's motion 
directly from the compressed measurements to obviate real- 
time reconstruction, thereby significantly reducing the real- 
time computational resources. Simulation results have veri- 
fied the efficacy of the proposed adaption algorithm. Future 



work will seek to embed this real-time framework into the 
hardware prototype. 
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